Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate

نویسندگان

چکیده

In a reward-free environment, what is suitable intrinsic objective for an agent to pursue so that it can learn optimal task-agnostic exploration policy? this paper, we argue the entropy of state distribution induced by finite-horizon trajectories sensible target. Especially, present novel and practical policy-search algorithm, Maximum Entropy POLicy optimization (MEPOL), policy maximizes non-parametric, $k$-nearest neighbors estimate entropy. contrast known methods, MEPOL completely model-free as requires neither any nor model transition dynamics. Then, empirically show allows learning maximum-entropy in high-dimensional, continuous-control domains, how facilitates meaningful reward-based tasks downstream.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

State-Dependent Exploration for Policy Gradient Methods

Policy Gradient methods are model-free reinforcement learning algorithms which in recent years have been successfully applied to many real-world problems. Typically, Likelihood Ratio (LR) methods are used to estimate the gradient, but they suffer from high variance due to random exploration at every time step of each training episode. Our solution to this problem is to introduce a state-depende...

متن کامل

A Non-parametric Maximum Entropy Clustering

Clustering is a fundamental tool for exploratory data analysis. Information theoretic clustering is based on the optimization of information theoretic quantities such as entropy and mutual information. Recently, since these quantities can be estimated in non-parametric manner, non-parametric information theoretic clustering gains much attention. Assuming the dataset is sampled from a certain cl...

متن کامل

A non-parametric method to estimate the number of clusters

An important and yet unsolved problem in unsupervised data clustering is how to determine the number of clusters. The proposed slope statistic is a non-parametric and data driven approach for estimating the number of clusters in a dataset. This technique uses the output of any clustering algorithm and identifies the maximum number of groups that breaks down the structure of the dataset. Intensi...

متن کامل

Non-parametric Entropy Estimation Toolbox (NPEET)

This document describes a package of Python code for implementing various non-parametric continuous entropy estimators (and some discrete ones for convenience). After describing installation, Sec. 4 provides a wide-ranging discussion of technical, theoretical, and numerical issues surrounding entropy estimation. Sec. 5 provides references to the relevant literature for each estimator implemente...

متن کامل

State agnostic planning graphs: deterministic, non-deterministic, and probabilistic planning

Planning graphs have been shown to be a rich source of heuristic information for many kinds of planners. In many cases, planners must compute a planning graph for each element of a set of states, and the naive technique enumerates the graphs individually. This is equivalent to solving an all-pairs shortest path problem by iterating a single-source algorithm over each source. We introduce a stru...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i10.17091